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ABSTRACT 

CN We present a fast method of producing mock galaxy catalogues that can be used to 

compute covariance matrices of large-scale clustering measurements and test the meth- 
ods of analysis. Our method populates a 2nd-order Lagrangian Perturbation Theory 
(2LPT) matter field, where we calibrate masses of dark matter halos by detailed com- 

\^^ parisons with N-body simulations. We demonstrate the clustering of halos is recovered 

at ~10 per cent accuracy. We populate halos with mock galaxies using a Halo Occu- 

^_^ pation Distribution (HOD) prescription, which has been calibrated to reproduce the 

clustering measurements on scales between 30 and 80 h~^ Mpc. We compare the sam- 
ple covariance matrix from our mocks with analytic estimates, and discuss differences. 
. . We have used this method to make catalogues corresponding to Data Release 9 of 

^ the Baryon Oscillation Spectroscopic Survey (BOSS), producing 600 mock catalogues 

of the "CMASS" galaxy sample. These mocks enabled detailed tests of methods and 
errors that formed an integral part of companion analyses of these galaxy data. 
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1 INTRODUCTION 



201lt, W iggleZ prinkwater et al.|[2010K HETDEX [Hill et 



, , , T-. ^ n, r. ^- (20041), and the Dark Energy Survey F designed to cover 

Galaxy surveys such as the the Baryon Osculation Spectro- h — '' — „,, , .i , i- . i n- . , 

_• c CTD/^oo rc~ri i — I — 1 I nnnn I Vc^- — „ ^ :„ — I — Tl large areas oi the sky, are currently leading the ertort to con- 

strain cosniological parameters using the observed clustering 



scopic Survey (BOSS, Schlegel et al.|2009a Eisenstein et al 
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of galaxies and quasars. In future, the baton will be passed 



to projects such as eBOSS, BigBOSS, Euclid (Laurejis et 
aL]|2011| and LSST ( |AbeU et al."2009'). These projects will 
cover large volumes of the Universe, and observe millions 
of galaxies in order to make precise measurements. BOSS 
aims to determine the cosmic expansion rate H{z) with a 
precision of 1 per cent at redshifts z ~ 0.3 and z ~ 0.6, and 
with 1.5 per cent at z « 2.5, by means of accurately mea- 
suring the scale of the baryon acoustic peak (Eisenstcin et| 
al.|2011 1 . The first steps towards this goal are presented in a 
companion paper ( Anderson et al.|2012 |, which provides the 
highest precision measurement of the baryon acoustic scale 
to date. 

Such large-scale clustering measurements require an es- 
timate of their covariance matrix in order to produce reliable 
cosmological constraints. One could get this matrix by run- 
ning a large number of N-body simulations and generating 
galaxy mocks. However this is computationally very expen- 
sive and, as surveys probe increasingly larger scales, imprac- 
tical. If only a small number of realizations is used («50 sim- 
ulations), then the estimated covariance matrix can be very 
noisy. There have been several suggestions in the literature 
on how to deal with this problem. 

When analysing SDSS-II DR7 Luminous Red Galaxies, 



Xu et al. (20121 used a smooth approximation to the mock 



covariance matrix. This technique involves fitting an ana- 
lytic form to a covariance matrix that is computed from a 
relatively small number of mock catalogues, using a max- 
imum likelihood approach with a number of underlying 
assumptions. This smoothing technique is critical in the 
regime of a small number of mocks, but would be obsolete if 
a sufficiently large number of mocks were available, requir- 
ing fewer underlying assumptions in the estimation of the 
covariance matrix. Such techniques may also be able to help 
translate matrices between cosmological models. 

Alternatively, the lognormal model has been used to 
generate large numbers of mock catalogues, from which co- 
variance matrices are calculated (Cole et al.||2005 Percival 



et al.|20T0 Blake et al.|2011 l. Because of its simplicity this 
approach is fast. However it does not properly account for 
non-Gaussianities and non-localities induced by non-linear 
gravitational evolution. 

Another method of estimating covariances is Jack-knife 
resampling, which allows errors to be estimated internally. 



directly from the data Krewski ( 1981 1; Shao and Tu ( 1995 1 



It does however require some arbitrary choices (number of 
Jack-knife regions, for example) and its performance is far 
from perfect (see e.g. Norberg et al. 2009). It also will not 
include fluctuations on the scale of the survey. 

Efforts to estimate covariance matrices directly from 
theory, that go beyond a simple rescaling of the linear Gaus- 
sian covariance, must deal with non-linear evolution, shot- 
noise, redshift space distortions, and the complex mapping 
between galaxies and matter (Hamilton, Rimes and Scocci- 
marro 2006; Sefusatti et al. 2006; Pope and Szapudi 2008; 
de Putter 2012; Sefussati et al 2012). 

In this paper we present a new method for generating 
galaxy mocks that is significantly faster than mocks based on 
N-body simulations. This method follows the main ideas put 
forward in the PTHalos method of Scoccimarro and Sheth 
(2002), but the implementation is overall simpler and differs 
significantly in some key aspects; the most relevant being 



that we do not use a merger tree to assign halos within big 
cells of the density field but instead we obtain the halos more 
precisely using a halo finder. This method is fast because it 
is based on a matter field generated using 2nd Order La- 
grangian Perturbation Theory (2LPT), but it still allows us 
to include the most important non-Gaussian corrections rel- 
evant for two-point statistics covariance matrices described 
by the trispectrum. 

We use this method to create 600 mock galaxies cata- 
logues occupying the volume required to accommodate the 
SDSS-IH data release 9 (DR9) BOSS CMASS sample. This 
sample contains around a quarter million high-quality spec- 
troscopic galaxy redshifts between 0.43 < z < 0.7 dis- 
tributed across an angular footprint over 3 000 sq. deg, and 
it represents the largest effective volume of any sample to 
date (see Anderson et al.|2012 for further details) . We apply 
the CMASS DR9 selection function to the mock catalogues 
we create, thereby allowing the calculation of covariance ma- 
trices that include the full effect of the survey geometry. We 
thus provide the means by which statistical errors are deter- 
mined for the CMASS DR9 sample. 

This paper has two parts. First, we describe our method 
for generating PTHalos and compare (and calibrate) it 
with N-body simulations from the LasDamas collaboration 
(McBride et al., in prep.). In the second part, we popu- 
late the PTHalos with mock galaxies in a way that matches 
the CMASS sample. These mocks have been used in several 
analyses of BOSS DR9 data, including the study of system- 
atics (Ross et al. 2012), the determination of the BAO scale 
(Anderson et al. 2012 1, redshift space distortions (Reid et al. 
2012, Samushia et al. 2012), evolution of galaxy bias (Tojeiro 
et al. 2012), the concordance with the ACDM model (Nuza 
et al. 2012), and the full shape of the correlation function 
(Sanchez et al. 2012). Note that the use of the mocks is not 
limited to only providing covariance matrices. For instance, 
by using mocks one can assess the level of expected chance 
correlation between galaxies and systematics (e.g. Ross et 
al. 2012). 

Galaxy PTHalos mocks will be publicly availablaM A 
table with the monopole of the correlation function and the 
covariance matrix is given at the end of the paper. All log 
values in this paper are in base 10. 



2 OVERVIEW OF THE METHOD 

Our goal is to develop a fast method for generating galaxy 
mocks, such that covariance matrices can be computed ac- 
curately for galaxy samples such as the CMASS DR9 ( |An-| 
derson et al.|2012 \ and the methods of analysis can be tested 
for bias and relative accuracy. The basic steps in the method 
can be summarised as follows: 



(i) Create a particle based 2LPT matter field (as de- 
scribed in Section PI . 

(ii) Identify halos using a Friends-of-Friends (FoF, Davis 
et al. 1985) halo-finder with an appropriately chosen linking 
length. We argue that, for the BOSS mean redshift, this link- 
ing length should be 0.38 times the comoving interparticle 
distance; see Section [6] 

^ http://www.marcmanera.net/mocks/ 
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(iii) Assign masses to halos by imposing a mass function 
that agrees with N-body simulations. 

(iv) Populate halos with galaxies using a HOD algorithm 
calibrated to fit the observational data. 

(v) Apply the survey angular mask and galaxy redshift 
distribution. 

We validate the first three steps by comparing our method 
with the clustering of halos in the N-body simulations whose 
halo abundances we have matched. We then apply the final 
steps by calibrating the HOD to the CMASS DR9 dataset. 
Finally, we generate 600 mocks of CMASS galaxies with DR9 
geometry and redshift selection. 

The gain in runtime achieved by generating PTHalos 
galaxy mock catalogues compared to creating mock cata- 
logues from N-body simulations comes from the first step: 
for the particle numbers used here, 2LPT is about three 
orders of magnitude faster than N-body simulations. The 
time taken to make mock catalogues in PTHalos is domi- 
nated by the subsequent steps, and thus the speedup factor 
at the end of the procedure is reduced to about two orders 
of magnitude. 



3 OVERVIEW OF THE BOSS CMASS DR9 
GALAXIES 

BOSS, part of the SDSS-IH (Eisenstein et al. 2011) is an 
ongoing survey measuring spectroscopic redshifts of 1.5 mil- 
lion galaxies, 160,000 quasars and a various ancillary targets. 
BOSS uses SDSS CCD photometry (Gunn et al. 1998,2006) 
from five passbands {u, g, r, i, z; e.g., Fukugita et al. 1996) 
to select targets for spectroscopic observation. 

The BOSS CMASS galaxy sample is selected with 
colour-magnitude cuts, aiming to produce a roughly volume- 
limited sample in the redshift range of 0.4 < z < 0.7, and 
results in a sample that is approximately stellar-mass lim- 
ited. These galaxies have a bias of ~ 2 and most are central 
galaxies of halos of 10^^ Mq, with a non-negligible fraction 
(~ 10 per cent)being satellites in more massive halos (White 
et al. 2011). 

DR9 includes data taken up to the end of July 2011. The 
details of the catalogue and mask used for the large-scale 
structure analyses are explained in [Anderson et ah] ( [2012] ), 
and an analysis of potential systematic effects is presented 
in Ross et al. (2012). DR9 covers approximately 3344 deg^ 
of sky (containing 264,283 usable redshift galaxies) of which 
2635 deg^ (containing 207,246 galaxies) are in the Northern 
Galactic cap (NGC) and 709 deg^ (containing 57,037 galax- 
ies) are in the Southern Galactic cap (SGC), as shown in 
Figure [l] The NGC and SGC have slightly different redshift 
distribution of galaxies; we show their normalised redshift 
distributions, n{z), in Figure [2] NGC and SGC mock cata- 
logues have been generated according to these distributions. 



4 SUMMARY OF 2LPT 

Basics of Lagrangian Perturbation Theory 



The Lagrangian description of structure formation ( Buchert 
[T989{ [Moutarde eFar][T99T] [Hivon et aL][T995| relates the 
current (or Eulerian) position of a mass element, x, to its 
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Figure 1. The Northern Galactic cap (NGC) and Sourthern 
Galactic cap (SGC) footprint of the CMASS DR9 galaxy sam- 
ple 
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Figure 2. Normalised redshift distribution of galaxies in the 
NGC (solid) and SGC (dashed) CMASS DR9 sample 



initial (or Lagrangian) position, q, through a displacement 
vector field ^(q). 



X = q + ^{q). 



(1) 



The displacements can be related to overdensities by (Taylor 
and Hamilton 1996) 



5(k)= /"d^ge-*''(e-**('i'-l) 



(2) 



Analogous to Eulerian perturbation theory, LPT ex- 
pands the displacement in powers of the linear density field, 
Sl, 

* = *(1) +*(2) ^... ^ (3) 

with ^'■"' being n"^ order in Sl- First order in LPT is equiv- 
alent to the well-known Zel'dovich approximation (ZA). 
The equation of motion for particle trajectories x{t) is 



(fx „,, s dx „^ 



(4) 



where V the gradient operator in Eulerian coordinates x 
and r is conformal time. Here $ denotes the gravitational 
potential, and T-i — ^^ — Ha denotes the conformal ex- 
pansion rate. H is the Hubble factor and a the scale factor. 
Substituting M into Q and solving the equation at 
linear order gives the Zel'dovich (1970) approximation (ZA), 



V,.*« = -I?i(r)(q), 



(5) 
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where we have taken a gradient of Eq[4]and used the Poisson 
equation to relate $ and {q). Here (g) denotes the Gaussian 
density field imposed by the initial conditions and -Di(t) 
is the linear growth factor. In Eq [5] the gradient is in La- 
grangian coordinates q, while in EqHjit is in Eulerian coordi- 
nates; the two are related by the Jacobian of the coordinate 
transformation. 

The solution to second order describes the correction to 
the ZA displacement due to gravitational tidal effects and 
reads 



can be derived from the spherical collapse in 2LPT dynam- 



v,.*(^' = Id, (r)^(*w*w -*«*«). 



(6) 



ii'j 



where the comma followed by a coordinate denotes partial 
derivative in that direction. 

Since Lagrangian solutions up to second-order are curl- 
free, it is convenient to define two Lagrangian potentials (^>^^' 
and 0(2' (*(') 
order reads 



aWi 



x{q) = q-D^ \/g(t>'-'> +D2 V 



a(2) 



(7) 



Likewise one can solve for the velocity field, which reads 



Di h H V,<?!.('' +D2 f2H V, 



a(2) 



(8) 



Here ii = ^ is the peculiar velocity, t denotes cosmic time, 
fi = ~£^) and D2 denotes the second order growth factor. 
To better than 0.6 per cent accuracy, 

D2{t) ^ -- n?i' -^^o-l/l43 



'^Dl{T)n-„ 



(9) 



for values of Ha between 0.01 and 1 (Bouchet et al. 1995). 
Lagrangian perturbation theory has been used to model 
baryon acoustic oscillations (Matsubara 2008a, b; Padman- 
abhan & White 2009; Padmanabhan, White & Cohn 2009). 
For a more detailed explanation of 2LPT see Bernardeau et 
al. (2002) and references therein. 

To generate the 2LPT displacement we used an algo- 
rithm that takes advantage of fast Fourier transforms (EFT) 
and is described in detail in Scoccimarro (1998). Although 
this algorithm assumes Gaussian initial conditions, it can 
be extended to treat non-Gaussian initial conditions given 
by any factorisable primordial bispectrum (Scoccimarro et 
al. 2012), and a parallel version of such code is publicly 
availabla_| In this paper we only consider Gaussian initial 
conditions, although the same procedure we describe can be 
applied to the primordial non-Gaussian case. 

Compared to Scoccimarro & Sheth (2002) our imple- 
mentation of 2LPT differs only in the smoothing applied to 
the linear density field before constructing the Zel'dovich 
displacement field. To reduce the effects of orbit crossing 
(where Lagrangian perturbation theory breaks down), they 
impose a cutoff in the linear spectrum, similar to the stan- 
dard truncated Zel'dovich approximation (Coles et al. 1993). 
However, rather than using their merger tree method to 
identify halos, here we identify halos by applying the EoE 
algorithm to the 2LPT field with a modified linking length. 
The theoretical motivation for the choice of linking length 



ics (see Section 6.1 1. In order to preserve this theoretical 



choice, we would like to change the linear density field on 
smoothing scales of the order of the Lagrangian size of halos 
as little as possible, while at the same time not have exces- 
sive orbit crossing effects for the halos that host the galaxies 
we are interested in. These competing requirements become 
increasingly difficult to satisfy as the halo mass we are inter- 
ested in decreases. Although we have not done an exhaus- 
tive investigation, a smoothing window described the linear 
density field Fourier amplitudes multiplied by q~''''-^+'^>'^ 
(with k in h/Mpc) works reasonably well for the halo mass 
range relevant for our purposes; see Section[6] On top of this, 
there is of course a sharp-cutoff in the linear spectrum at the 
Nyquist frequency of the particle grid used to generate the 
fields (with grid size A'^gHd = 1280). 



5 COSMOLOGY AND RESOLUTION 
SPECIFICATIONS 

We have produced halo and galaxy mocks using two differ- 
ent sets of ACDM cosmological parameters. The first set 
has been chosen to match that of the N-body simulations 
we use to calibrate the PTHalos method, while the second 
set has been chosen to have values closer to those expected 
from observations. 

LasDamas Cosmology: 

The fiducial parameters for this cosmology are: 
fi™ = 0.25, Oa = 0.75, Qb = 0.04, h = 0.7, aa = 0.8 and 
Us = 1. These parameters were used by the LasDamas 
collaboratiorrl which produced a suite of large N-body 
cosmological dark matter simulations (McBride et al., in 
prep). These simulations were run with a TreePM code 
Gadget-H (Springel 2005), with a EFT grid size of 2400 
points in each dimension. Each simulation run covers a 
cubical volume of a box size L=2400 Mpc/h, and have 1280^ 
dark matter particles. We have created PTHalos mocks 
assuming the same cosmology and resolution parameters, 
so as to properly compare halo clustering in each of the 40 
N-body simulation runs, and thus calibrate our method. As 
shown in Section [6] we achieve a 10 per cent accuracy in 
the clustering of halos. 

WMAP Cosmology: 

The second ACDM cosmology that we consider has the 
following parameters: fi™, = 0.274, Qa = 0.726, Qt = 0.04, 
h = 0.7, (78 ~ 0.8 and Us = 0.95. These are the same as those 
used to analyse the first semester of BOSS data (White et 



al. 2011) and in the Anderson et al. ( 2012 1 analysis; they are 



within la of the best fit WMAP7 concordance cosmological 
model ( [Larson et al. ||201l"| ). 

We have two simulations of 3000^ particles and cubical 
box size of L=2750 Mpc/h with which we compare the 
clustering of our runs. These simulations were performed 
with the TreePM code described in White (2010), which 
has been compared to a number of other codes and shown 
to achieve the same precision level as other N-body codes 



2 http://cosmo.nyu.edu/roman/2LPT/ 
http://www.marcmanera.net/2LPT/ 



and 



^ http://lss.phy.vanderbilt.edu/lasdamas/ 
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for such simulations ( Heitmann et al.]|2008 |. We use one of 
these simulations in Section \qM 



Resolution Parameters: 

We run 2LPT for our mocks in a cubical box of of size 
L = 2400 Mpc/h with A^ = 1280^ particles. This matches 
the specifications of the Oriana simulations of LasDamas 
suite, and allows us to easily match the Fourier phases in 
2LPT runs to those of the Oriana simulations, thus allow- 
ing a direct comparison for each realisation. With these pa- 
rameters the mass resolution for the LasDamas and WMAP 
cosmologies is Mp„t = 45.7- lO^°M0//i and 50.1- 10^°Mo/fe, 
respectively. The cubical box was matched to the CMASS 
DR9 geometry as explained in Section [8.2[ 



6 PTHALOS 

The first step is to generate a 2LPT field, as described in 
Section [4] which is traced by means of a distribution of par- 
ticles. Based on this field, halo positions and raw masses 
are found by means of a FoF algorithm, which percolates 
all pairs of particles separated by a distance d ^ b. This 
algorithm has become a standard technique and has been 
used extensively in astrophysics and cosmology since Davis 
et al. (1985). Using the LasDamas simulations we calibrated 
the FoF linking length, and set b = 0.38 times the mean 
interparticle separation as the value for generating mocks. 
Note that this is substantially larger than the usual choice, 
b — 0.2, in N-body simulations. Section [6. 1| shows that this 
choice is motivated by 2LPT dynamics. 

The second step of the method is a reassignment of 
halo masses. Respecting the ordering given by the FoF num- 
ber of particles, 2LPT halo masses are changed so that the 
mean mass function of PTHalos matches a given fiducial 
mass function. The underlying understanding here is that 
the ranking of the masses is more accurate than their exact 
values, which will vary according to the definition of halo 
boundaries, both in N-body simulations and 2LPT runs. 

Note that, given a fiducial mass function for PTHalos, 
a fixed 2LPT halo mass always corresponds to the same 
PTHalo mass. That is, the mapping of the masses is between 
the mean of 2LPT realizations of the mass function and the 
targeted fiducial one. In this way, the scatter of the measured 
mass function between 2LPT realisations is translated, as 
expected, into a scatter of the PTHalos mass function. 

In this paper, the PTHalos realisations with the Las- 
Damas cosmology have as a given mass function that of the 
mean of the LasDamas N-body simulations. The PTHalos 
realisations with WMAP cosmology use the mass function of 
Tinker et al. (2008), and adopt the definition of dark matter 
halos that correspond to overdensities 200 times the mean 
background density. 



6.1 Linking length: Theoretical motivation 

The appropriate FoF linking length in Eulerian N-body sim- 
ulations is roughly determined as follows. Given flm and Qa 



one uses a fitting function (see Eq. 11 1 to compute the virial 
overdensity A„ir- of halos within the spherical infall model. 
For the LasDamas cosmology, Avir = 377 times the mean 
background density, at redshift zero. 
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Figure 3. Ratio between PTHalos and N-body halo-matter cross- 
power spectra as a function of linking length, b, for the 10^ most 
massive halos. From top to bottom linking length are: 0.27, 0.30, 
0.36,0.38 (in lighter color), and 0.40. N-body halos use b = 0.2 
with the corresponding mass threshold of 3.02 • 10^'^MQ/h 



Then, assuming an isothermal profile for the dark mat- 
ter halo, one can relate the mean density of the halo to the 
density at the virial radius, i.e, pRvir = Avir/S. This density 
is converted to a mean separation of particles by assuming 
that the density at the virial radius is equal to that of two 
particles in a sphere of radius b. For the LasDamas cosmol- 
ogy, this gives b = 0.156 in units of the mean interparticle 
separation. For an Einstein de Sitter cosmology, f2m — 1, 
b = 0.2, which is the value most commonly used in the lit- 
erature. 

Now, because the 2LPT dynamics is an approximation 
to the true dynamics of the dark matter field, it yields halo 
densities that consistently differ from the N-body densities. 
Consequently, the FoF linking length of 2LPT matter field, 
&2LPT, needs to be rescaled from the value used in N-body 
simulations, bsim- The rescaling is given by 



^bs 



(1/3) 



(10) 



Both the halo virial overdensity in N-body simulations, 
AJ™, and its corresponding value in the 2LPT field, AJ'j™, 
are easy to compute. For the N-body case we take the value 
of Bryan and Norman (1998), 



= (187r2 + 82(n„(^) - 1) - 39(n„(z) - l)^)/n„(z) , 

(11) 



where 



n„^{z)^nrr^ii + zy/H{z), 



(12) 



which gives Avir = 244 at redshift z=0.52. We chose this 
redshift because it is the redshift at which we will compare 
with LasDamas simulation outputs, and it is close to the 
mean redshift of the BOSS CMASS sample, for which we 
want to produce galaxy mock catalogues. 

The Lagrangian A^f/""^ can be easily obtained from the 
relation between Lagrangian and Eulerian fields, which are 
related by the determinant (Jacobian) of the transformation 
ofEq. Q, 
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Figure 4. Ratios between PTHalos and N-body halo-matter 
cross-power spectra as a function of halo mass threshold, for a 
linking length of b=0.38 (2LPT) and b=0.2 (N-body). The halo 
masses are given in Table IT] 



(Det(l + 9*,:/agj))~ 



(13) 



Having solved Equations ([5| and (|6|, and thus knowing 
^ at second order, this equation can be rewritten in terms of 
the growth factor. Then, assuming spherical symmetry for 
simplicity, it reads 



= 1 



So 



(Di 



So 



D2 



(14) 



where So is the over-density at the initial time. Since we 
know from spherical collapse in Eulerian dynamics that a 
halo has virialised when its linear density fluctuation is 
DiSo = 1.686, we can predict the 2LPT over-density of at 
virialization to be A^fJ"^ — 5'^f'/"^ + 1 = 35.43 times the 
mean background density. 

Therefore, using Equation ( |10[ ) we find that to conduct 
a robust comparison between PTHalos halos and N-body 
halos of linking length of b = 0.2, we need to use a linking 
length of fo = 0.38 in the 2LPT field, ft is worth emphasising 
that this predicted value is approximate. The actual value 
has to be confirmed (or otherwise set) by comparing the 
clustering of halos between 2LPT and N-body simulations. 
This processs is described in the next Section, where we find 
that this value works very well at the 10 percent level. 

In principle, one can use spherical overdensity (SO) 
methods to identify halos instead of the FoF algorithm 
( Lacey fc Cole|1994 1 . A similar procedure to that discussed 
in Section |6.1 1 could then be used to match the SO density 
parameter in N-body simulations to 2LPT. 



log TV 


Mass (Afpart) 


Color 


3.5 


44.3 (968) 


Red 


4.0 


30.7 (672) 


Light Green 


4.5 


19.8 (432) 


Blue 


5.0 


11.7 (256) 


Purple 


5.5 


6.31 (138) 


Orange 


6.0 


3.02 (66) 


Cyan 


6.5 


1.28 (46) 


Dark Green 



Table 1. The number of halos, their mass, and associated color. 
Masses of halos in N-body simulations as a function of their po- 
sition in the mass-ranked list. That is, given the A'^ most massive 
halos in the volume L = (2400Mpc//i)^ the lower mass in the 
sample is M. Masses are from run 1001 of Oriana N-body Simu- 
lation at 2 = 0.52 and are given for the linking length of 6 = 0.2. 
Masses are in units of W^^Mq/H and are not corrected for dis- 
creteness effects. For each halo mass we have shown in parentheses 
the number of particles that that halo has given our mass resolu- 
tion. 



given by Eq [TO] We then computed the cross-power spec- 
trum between the 2LPT halos and the N-body matter field, 
Phm.,2ipt{k), and the cross-power spectrum between the N- 
body halos and N-body matter field, Phm,aim{k), where 
these latter halos, from LasDamas, where obtained with 
& = 0.2. 

The comparison between these two spectra gives a mea- 
sure of accuracy of the bias of the 2LPT halos. In particular 
we are interested in comparing the ratio Phm,2ipt / Phm.,sim 
since this is equivalent to the ratio of the halo bias factors. 
Note that we have computed the cross-power spectra and 
not the auto-power spectra since in this case we do not need 
to correct our estimator for shot-noise. 

The results are shown in Figures[3]and[4] Figure[3]shows 
the ratio Phm,2ipt / Phm,sim of the million most massive halos 
as a function of the wavenumber k for different values of the 
linking length. We see that, as we increase the linking length, 
the ratio of the cross-powers decreases. There is a range of 
linking lengths ( 0.36 < b < 0.40 ) for which the ratio of the 
bias is within 10 per cent. 6 — 0.38, as computed using Eq 
1 10| performs best. 

The sample of halos of Figure |3] is equivalent to a mass 
threshold of M = 3.02 • lO^^Mg/Zi. We have found that 
b = 0.38 also fits the other halo samples with different mass 
thresholds within 10 per cent accuracy. In fact, it is the 
linking length that performed best across the range of halo 
masses we consider, and that are shown in Table [l] The 
ratios Phm,2ipt / Phm,aim for thcse halos are plotted in Figure 
[4] with the colours referenced in Table [l] All the curves are 
within 10 per cent accuracy, which justifies our choice to set 
b — 0.38, in accordance with the theoretical expectation, as 
our fiducial value for the 2LPT linking length. 



6.2 Linking length: Calibration with N-body 
simulations 

In order to confirm the linking length that we need to use 
for PThalos, we have run a 2LPT simulation at z=0.52 with 
the same Fourier phases and amplitudes as that of one of 
the Oriana simulations from the LasDamas collaboration. 

We obtained halos from the 2LPT dark matter field 
using FoF with different linking lengths around the value 



6.3 Variance and Cross-Correlation Coefficients 

Having set the PThalos linking length, we run 40 2LPT dark 
matter fields, with the same Fourier phases and amplitudes 
as the 40 simulations of LasDamas suite. For all of them we 
compute the halo-matter cross-power spectra as in the pre- 
vious Section, for the first million halos of both the PTHalos 
and N-body halos, and we compute the corresponding co- 
variance matrices: 
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Figure 5. TOP: Ratio of the cross-power variance of PTHalos 
and N-body simulations for a mass threshold of 3.02 ■ IO^^Mq/H. 
The expected ratio is shown in a solid line and a 15 per cent 
band range is shown in dashed lines. MIDDLE and BOTTOM: 
Comparison of correlation coefficients of N-body (solid blue) and 
PTHalos (dashed red) halo-matter cross-power spectra. Middle 
panel fci = 0.06. Bottom fci = 0.201 




0.005 0.010 0.020 0.050 0.100 0.200 
k [Mpc/h] 

Figure 6. Ratios between 2LPT and N-body halo power spec- 
tra as a function of k for different halo mass thresholds, for a 
linking length of b=0.38 (2LPT) and b=0.2 (N-body). The cor- 
respondence between color and halo mass thresholds are given in 
TablelT] The power spectra have not been corrected for shot-noise. 
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Figure 7. Same as Fig. |6] but with shot-noise corrected power 
spectra, assuming Poisson noise. 



first million halos, which is an equivalent mass threshold of 
3.02 ■ 10^'^A4Q/h. We can see that most points lie within 15 
per cent range of the expected value at linear order, which 
is given by the square of the ratio of the halo bias. Since the 
bias of the halos is accurate at 10 per cent the variance is 
accurate at about 20 per cent. 

In the middle and bottom panels of Figure [5] we show a 
comparison between the correlation coefficients of PTHalos 
(dashed red) and N-body simulations (solid blue). Each line 
shows the mean of the 40 realizations that have the same 
phases. Both are clearly similar, showing that the PTHalos 
preserve the same structures as the N-body simulations. 



O (n^i, Kj 



(15) 



iV 



.. ]V=40 

3Y J2 (Phmikl) - Phm{kl)){Phm{k2) - Phmiki)) 



where Phm is the mean power spectrum of the set of 
simulations. The variance of the cross-power spectrum is 
defined as Cross Variance — Var(fc) — C{k, k) and the 
correlation coefficients of a given fci as Corr(fci,fc) — 
C(fci,/fc)/yVar(fci)Var(fc). 

In the top panel of Figure [5] we show the ratio of the 
variances between PTHalos and N-body simulations, for the 



6.4 Autocorrelation 

In Figure |6J we show the ratio between the auto-power spec- 
trum of PTHalos and the auto-power of N-body halos, where 
we did not subtract a shot noise contribution. We see that 
for most of the masses this ratio is within 10 per cent, except 
for the halos of our lower mass range (dark green line) that 
are significantly more clustered than the N-body halos of 
equivalent mass. This is likely due to a fraction of small ha- 
los being clustered around massive ones, probably because 
of the shell crossing effects that makes halos in 2LPT less 
compact than in N-body simulations. 
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In FigurelTlwe show the ratio of the auto-power spectra 
for one realization of PTHalos and an N-body simulation 
from LasDamas with the same Fourier phases. Before doing 
the ratio, a Poisson shot noise contribution of 1/n (where 
n is the number density of halos) was subtracted from the 
power, as is common under the approximation of Poisson 
sampling. Note however, that there are indications in the 
literature that the shot-noise of halos is not strictly Poisson 
(Appendix A, Smith et al. 2007). As seen in FigurelTlwe re- 
cover a ratio within ~ 20 per cent for most masses and range 
of scales, which is consistent with our findings of an accu- 
racy of 10 per cent in the ratio of the bias (or equivalently 
of the cross-power spectra). At small-scales, for k > 0.15 /i/ 
Mpc, PTHalos performance decreases significantly. This is 
the region between the one halo and the two halo terms of 
the correlation function, and corresponds to the scales that 
are more difficult to model with our simple method. 
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Figure 8. The ratio between the average power spectrum cal- 
culated from PTHalo simulations and the power spectrum calcu- 
lated for halos selected from the White et al. (2011) simulation. 
For both we apply a mass cut of IO^^Mq/H. Poisson shot-noise 
(1/n) has been subtracted. 



6.5 PTHalos with WMAP cosmology 

So far we have established a method to obtain halos from 
a 2LPT dark matter field, which matches the clustering of 
simulations at 10 per cent. We have tested the method by 
comparing the clustering of PTHalos with that from the 
LasDamas N-body simulations suite. 

In the rest of this paper, we use our WMAP fiducial 
cosmology, which is closer to that expected from observa- 
tions. 

Using our PTHalos code we have generated 600 2LPT 
fields at z=0.55. PThalos were obtained using a linking 
length of 6 = 0.38. For these runs, since we cannot use Las- 
Damas mass function to set the mass of the halos and in- 
stead use a general description of Tinker et al. (2008), using 
SO halos corresponding to 200 times the mean background 
density. 

We do not expect the change in cosmological model 
to significantly affect the accuracy of the PTHalos method. 
Nonetheless, we have compared the PTHalos clustering with 
the clustering of the N-body simulation of White et al. 
(2011) for halos above IO^^Mq/Zi. This N-body simulation 
reproduces a piece of the universe with the same cosmo- 
logical parameters that we use in the remaining of the pa- 
per. Their halos are identified with a FoF algorithm with 
b=0.168, but the clustering is still matched at the 10 per 
cent level. This can be seen in Figure [S] where we have plot- 
ted the ratio of the halo power spectrum calculated from 
the N-body simulation and the PThalos method. This re- 
sult shows the robustness of the PTHalos method. 

For this N-body simulation we also show in Figure [9] 
the mass function of the halos together with that of Tinker 
et al. 2008, which is the mass function we used to set the 
masses of PThalos for our fiducial WMAP cosmology. As 
expected the fit is good except at the low mass end where 
the mass resolution effects of our simulation start to become 
important. 
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Figure 9. Comparison of mass functions from the simulation 
of White et at 2011, assuming a friends-of- friends parameter 
b = 0.168, and the Tinker et al. (2008) mass-function fitting func- 
tion for halos corresponding to 200 times the mean background 
density. 



7 POPULATING HALOS WITH GALAXIES 

T.l Halo Occupation Distribution 

To populate halos with galaxies we use a Halo Occupation 
Distribution (HOD; Peacock & Smith 2000, Scoccimarro et 
al. 2001, Berlind & Weinberg 2002) functional form with five 
parameters, as used by Zheng et al. (2007). In this form, the 
mean number of galaxies in a halo of mass M is the sum of 
the mean number of central galaxies plus the mean number 
of satellite galaxies, N(M) = {Ncen{M)) + {Nsat{M)), where 



(Ncen) = 2 



1 + erf 



logA/ - logAf„ 



O'logM 



(Nsat) = (iVcen) 



M - Mo 
Ml 



(16) 



and Nsat = if the halo has M < Mq. The error function 
characterises the scatter between the mass and the luminos- 
ity of the central galaxy, and the power law in the satellite 
occupation term characterises the efficiency of galaxy forma- 
tion on mass. The exact value of the HOD parameters that 
we use was determined by fitting the DR9 galaxy clustering 
data, as explained in Section |8.1| The existence or not of a 
central galaxy in each halo is given by the A^cen probability, 
and the number of satellites will be drawn from a Poisson 
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distribution with mean value Nsat- In the rare event that we 
draw one sateUite galaxy but no central one, we treat it as 
a central. 



7.2 Halo profile 

We have distributed satellite galaxies within a halo following 
a NFW density profile ( [Navarro, Frenk fc White|1996[ ): 



p(r) 



^Ps 



1 + 



(17) 



where r^ is the characteristic radius where the profile has a 
slope of —2, and ps is the density at this radius. The ratio 
between the virial radius Rvir and the characteristic radius 
gives the concentration parameter, 



Thvir 



(18) 



The masses of the halos and their concentrations are 
related. For our galaxy mocks we use the relation found by 
Prada et al. (2011) when fitting data from N-body simula- 
tions. 



c{M, z) 



Bojx) 
Bo(1.393) 

Bijx) 
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(19) 



C(cr') = A 



where 
Bo{x) 

Bi{x) 



= Co + (Ci - Co) 



exp 



1 r / M 1 

— arctan [a(x — xo)\ H — 
n 2 



(Jo (71 (70 



— arctan [I3{x — 2;i)] + - 

TV 2 



(20) 



and the parameters from the N-body fit are: A = 2.881, 
b = 1.257, c = 1.022, d = 0.060, co = 3.681, ci = 5.033, a = 
6.948, xo = 0.424, a^^ = 1.047, (7~^ = 1.646, /3 = 7.386, 
xi = 0.526. 

The cosmology and redshift dependence of the fit enters 
through X — (f^A/f^m)^ /(I + z) and through the variance 
of the halos of a given mass, a{M,z). The masses in the 
equations above are defined such that the mean density at 
the virial radius is 200 times the critical density. Since we are 
using the Tinker et al. (2008) definition to set the masses, 
our PTHalos are closer to 200 times the background density. 
Using the NFW we can easily move from one definition of 
halo mass to another, and use each formula appropriately. 

We have added a dispersion to the mass-concentration 
relation. We use a lognormal distribution, thus the proba- 
bility of a concentration c for a halo of mass M is. 



values of aiogc are between 0.043 and 0.109 (Giocoli et al. 
2010). We have chosen for our mocks the value aiogc ~ 0.078, 
which is close to the mean. 

The scatter of the mass-concentration is not dependent 
on cosmological parameters (Maccio et al.|[2008K 



T.3 Galaxy velocities in halos 

We assign velocities to the galaxies in halos by using the 
virial theorem, which states that the average kinetic energy 
of particles is half the average of the negative potential en- 
ergy, {«^) = {GM{r)/r). This average over the dark matter 
particles can be expressed as an integral of dark matter pro- 
file, 



(«> 



f^fGM{r)dM 



(22) 



Assuming a NFW profile, the mass inside a given radius 
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and therefore, the virial velocity reads 



(v) = 



GM 



c _ ln{l + c) 

2(l + c) 1 + c 



{ln{ 
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(23) 
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TLvir 



0.5c{l + c)~ [1 + c)ln{l + c) 
((1 + c);n(l + c) - c)'' 



F{c) 



where the last equality defines F{c) that we will use later. 
Here, again, c denotes the concentration parameter, c — 
Rvir/rs, rs denotes the characteristic NFW radius, and Rvir 
denotes the virial radius, defined as the radius at which the 
average density of the halo is A times the mean density p, 
M — An/SR^Ap. As mentioned before, the value of A is 
typically taken to be 200, as in PTHalos, but other num- 
bers are also motivated by the spherical collapse model and 
N-body simulations. 

Once we have the typical velocity dispersion of a halo we 
assign positions and velocities to its galaxies in the following 
way. If there is only one galaxy, we place it at the centre of 
mass with the velocity of the halo. If there is more than 
one galaxy, the first one is placed at the centre of mass, 
and the others following the NFW density profile. For these 
galaxies their velocities have two components: the velocity of 
the halo centre of mass, and a contribution from the velocity 
dispersion. We take the latter to be drawn from a Gaussian 
distribution with zero mean and variance equal to 



{«1d) = g("^ +Vl +V^) = g(«^)- 



(24) 
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(21) 



where c is the mean mass-concentration relation. Typical 
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T.4 Redshift Space Distortions 

We use the velocity of galaxies to simulate the effects of 
Redshift Space Distortions (RSD). We therefore alter the 
positions of galaxies such that each galaxy is set to where it 
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would be observed in redshift-space coordinates. To achieve 
this one must convert velocities into displacements by divid- 
ing the former hy H = a = Ha and projecting the result in 
the line-of-sight. The displacement As in Mpc/h that corre- 
sponds to a velocity of magnitude of ■^/{wf^) is easily com- 
puted. Since G = SHo/Snpcrit, pcrit = ^mP, and defining 
the Hubble expansion rate as H{z) = HoE{z), one gets 



Rv 



E{z)a ^ ^ 



(25) 



where F{c) has been defined in Equation 23 In our PTHalos 



galaxy mocks we do not use the distant observer approxima- 
tion (usually implemented in the simulations by displacing 
the particles only in one axis). Instead, we add the RSD 
along the line of sight, as it happens in observations. 
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Figure 10. Best fit HOD of the mocks (black solid line), with 
its contribution split between central galaxies (dashed line) and 
satellite galaxies (dotted line). Grey shadowed area shows the 
mass range for which galaxies are drawn from matter particles. 
White et al. (2011) best HOD fit to CMASS data is shown in red. 



7.4-1 Extending the model 

We made several simplifying assumptions to implement the 
the galaxy population method presented in this paper. Many 
effects of the complex relation between halos, matter and 
galaxies are not included in these mock galaxy catalogs. 

We chose to model the galaxies on top of a static reali- 
sation of the matter field, which assumes the evolution over 
the redshift range is small. This will impact the clustering 
of matter, as well as the associated halo masses. Although 
we expect this effect to be small for the mock galaxy cata- 
logs used in CMASS DR9 results, we could improve on the 
method for future applications and model this evolution. 

For simplicity, the mocks also neglect any evolution to 
populating dark matter halos, or varying the galaxy bias 
with redshift. While the sampling of galaxies is adjusted to 
match the density as a function of redshift (see Fig. pi, a 
change in number density is likely to correspond to a varia- 
tion in galaxy selection, and therefore, the associated galaxy 
bias (more luminous galaxies typically correspond to lower 
number densities and higher bias values). Again, we expect 
a small impact on any CMASS DR9 results (Anderson et 
al. 2012) since much of the modeling assumes an average 
bias value over the redshift range, which the galaxy mocks 
appropriately match. 

We also did not include assembly bias effects (Sheth 
& Tormen 2004, Croft et al. 2011) in our mocks, but kept 
the concentration parameter and HOD independent of the 
environment. For simplicity we also have set independent 
scatters for the number of galaxies in a halo and the con- 
centration parameter, even if at a fixed halo mass they nright 
be related. 

Halos in the mocks are spherical. In reality, as shown 
by N-body simulations, they have a range of shapes that are 
correlated to the morphology of the surrounding environ- 
ment (Smith & Watts 2004, Schneider et al. 2011, White et 
al. 2010). The mocks described in this paper included none 
of these effects. In future versions, a correlation with the 
environment could be introduced via the 2LPT estimation 
of the tidal field. 

Finally, the galaxies in the mocks have no individual 
colors or luminosities. One could include them by follow- 
ing a similar prescription to one described in Skibba et al. 
(2006, 2009) which was constrained by SDSS luminosity and 



colour dependent clustering, number densities and colour- 
magnitude distributions. 



8 GALAXY MOCKS FOR THE CMASS DR9 
SAMPLE 

8.1 Fit to CMASS galaxies 

In order to find values of HOD parameters for the mocks 
of BOSS CMASS DR9 galaxies, we minimize x^ between 
measured BOSS clustering of the full CMASS DR9 sample 
(NGC plus sec) and the clustering of a mock realization. 
We choose the mock realization for which the power spec- 
trum is closest to the mean of the mocks and compute, for 
each HOD iteration, ^(s) with s between 30 and 80 Mpc/h, 
in an area of a quarter of the sky, with a simple mask 
and a constant n{z), but including redshift space distor- 
tions. We populate halos until a minimunr mass threshold 
of M = 0.47 • 10"Mq//i, which corresponds to halos of 10 
particles. The rest of the galaxies, which according to each 
HOD would belong to halos with a lower mass, are placed 
on randonrly selected dark matter particles. 

To find values of HOD parameters that minimize x^ we 
use the simplex algorithm of Nedler and Mead (1969). We 
start by making an initial guess about the values of the HOD 
parameters and then construct a five-dimensional simplex 
with vertices at this initial point and five other points that 
resulted from stepping along each coordinate axes with a 
certain step-size. The algorithm finds the vertex with the 
worst x^ value and moves it by a combination of refiection, 
refiection followed by expansion and multiple contractions 
until the value of x^ a^t that vertex is no longer the worst. 
The algorithm then keeps contracting the simplex by moving 
the next worst vertex until the size of the average distance 
from the centre of the simplex to its vertices is smaller then 
a desired level of accuracy. If the x^ surface is unimodal 
this algorithm is guaranteed to find the minimum with any 
desired accuracy. 

Our initial guess of HOD parameters was the best-fit 
set computed using the clustering and number density of an 
earlier CMASS sample (see White et al. 2011). After about 
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40 steps the resulting best-fit HOD was 
log(M„,„) = 13.09 
log(Mi) = 14.00 
log(Mo) = 13.077 
cTiogAf = 0.596 
a = 1.0127. (26) 

We find x^ = 5.89 witii 9 degrees of freedom. In Figure 10 
we show in black the mean number of galaxies as a function 
of halo mass for our best fit. In red we show the best-fit 
model of White et al. (2011). Both agree to within the 1- 
sigma errors, and the mean number of galaxies at a given 
mass, N{M), agrees better than 10 per cent for halos below 
10^*'^M©//i and better than 20 per cent between lO^*'^ and 
lO^^MQ/h. 

The shadowed area in the plot denotes the masses for 
which we have no halos in the simulation. The galaxies cor- 
responding to those halos have been assigned positions and 
velocities of randomly selected dark matter particles. They 
form ~ 25 per cent of the total of mock galaxies. If we did not 
include them then we would not have recovered a sensible 
HOD because we would have had to populate the available 
low mass PThalos with far too many galaxies in order to 
reduce the bias. 

It is possible to set the HOD parameters of the mocks 
more accurately by fitting both the two and the three point 
correlation functions, as the latter helps to break degenera- 
cies between the parameters ( Sefusatti fc Scoccimarro|2005 



Kulkarni et al. 20071. However, computing the three point 



function in each step of the fitting process is computation- 
ally very time consuming. We leave this improvement as a 
possibility for future versions of the mocks. 

8.2 Geometry and mask 

We wish to create mocks with a geometry appropriate for 
the BOSS CMASS DR9 galaxy sample, including both the 
Northern and Southern Galactic caps, with redshifts be- 
tween 0.43 and 0.7. These are the data used in a number of 
recent cosmological analyses ( Anderson et al.|2012 Sanchez 
et al. 2012; Samushia et al. 2012; Reid et al. 2012; Nuza et al 
2012; Tojeiro et al. 2012a,b; Ross et al. 2012). In this section 
we show how we match the DR9 CMASS geometry. 

The Northern and Southern Galactic cap regions can 
either be fitted into a reshaped box with size L = 2.4Gpc/h, 
which is the size we adopt for our PTHalos runs. The re- 
shaping is achieved as follows. Starting with a cubical box 
of size L, we cut the xy-p\a,ne as indicated in the top panel of 
Figure |11[ Using the periodicity of the PTHalos simulation 
we can copy or move the particles from outside the range 
L/2 < X + y < 3L/2 into that same range. Thus, as shown 
in the second panel from the top of Figure [TT] we can obtain 
a rectangular box of size L/\/2, 2L/-\/2, L. The last dimen- 
sion is defined as the z-direction. This technique is similar 
to volume remapping of Carlson and White (2010). 

With this geometry, placing our observer at {x, y, z) = 
(L/4, L/4, 0) we can cover a quarter of the sky up to a dis- 
tance of L/\/2 from the observer without repetition of the 
underlying matter distribution. This is shown in the third 
panel from the top of Figure |11| For the WMAP cosmolog- 
ical model this distance is equivalent to reaching a redshift 



z — 0.663. Notice, however, that the constraint of a max- 
imum distance of L/\/2 is set only because of the geome- 
try of the z = plane. Keeping the observer in the same 
place, but looking into a direction off the plane, we could 
go to a higher distance without hitting repeated volumes. 
Translating to consider an angular region, the above max- 
imum distance is only valid if we require a full 180-degree 
wide view and, for example, for an opening of 126.87 degrees 
centered on the direction e = {x + y)/\/2 would allow us to 
reach a distance of ^5/8L without repetition. It is clear that 
the actual maximum distance achievable with any given box 
without repetition will depend on the angular mask of the 
survey being analysed. 

To generate the mocks for DR9 CMASS, we first pro- 
duce a redshift shell such as that shown in the bottom panel 
of Figure [TTj We then rotate the 3D coordinates to fit either 
the NGC or SGC angular footprint into the box contain- 
ing the redshift shell. Images of these angular footprints are 
shown in Figure [l] The extent of these masks means that 
our boxes are of sufficient size that mock catalogues contain- 
ing galaxies with redshifts z < 0.7 do not suffer from any 
repetition of the underlying density field. 

In order to mimic the observations as closely as possible, 
we use the MANGLE software (Swanson et al. 2008) to dif- 
ferentiate between sectors that have difi^erent observational 
properties, as described in Ross et al. (2012). The complete- 
ness in the mock galaxies is defined slightly differently from 
that of the CMASS DR9 catalogues. As we are only inter- 
ested in large-scales, we do not mimic the full small-scale 
BOSS targeting procedure in the mocks. In particular, we 
ignore the effect of missing close-pairs of galaxies that re- 
sult from the fact that we cannot observe two targets closer 
than 62" with the same plate; this is a physical limitation 
imposed by the size of the fibres. We also ignore the effect of 
plate-scale angular variations in our redshift success rate. In 
Section 3 of [Anderson et ah] ( [2012] ) two completeness mea- 
sures are defined: the fraction of objects targeted that are 
observed or are in a close-pair, Cboss, and the fraction of 
galaxies with good redshifts, Crod- For the mocks, we revise 
the definition of sector completeness such the angular varia- 
tions in galaxy density follow those in the sample with good 
redshifts, ignoring close-pairs. We therefore define 

Cmock = -TT ^-ry , (27) 

where A'obs is the number of objects observed spectroscopi- 
cally by BOSS in any sector, A^'targ is the number of target 
objects, and A'known is the number that already have good- 



quality known redshifts. Following Anderson et al. (20121, 
the redshift completeness is defined as 



Crcd — 



(28) 



A'obs — A'star 

where A'^gai is the number of targets within a sector, observed 
by BOSS and subsequently spectroscopically classified as 
galaxies with good redshifts, and A'star is the number classi- 
fied as stars. We subsample galaxies in our mock catalogues 
based on the product Cmock x Crcd- i.e. we subsample based 
on angular fiuctuations of galaxies with good redshifts, ig- 
noring other subtleties. The implemented angular mask can 
be seen in Figure [T] 

As we are only interested in matching the large-scale 
clustering signal we do not include small-scale holes in the 
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survey mask such as those due to SDSS fields with know 
photometric problems, objects observed with higher priori- 
ties, bright stars, and plate centres (see Anderson et al.|2012 
for details). In total these remove 5 percent of the mask 
area, as defined by overlapping tiles, and the holes represent 
small angular patches that are approximately randomly dis- 
tributed. As we are only interested in large-scales, the net 
effect on removing such holes is equivalent to reducing the 
galaxy density, rather than the volume. Consequently, we 
simply match the total galaxy number after removing these 
regions from the CMASS DR9 galaxy catalogue. 

In order to mimic the measured redshift distribution we 
subsample the galaxies in each PTHalos mock based on a 
smooth fit to the measured redshift distribution, n{z). We 
do this separately for the NGC and SGC areas, as they have 
slightly different redshift distributions (see Fig. [2| and Ross 
et al. 2012). 

Using the above procedure we generated 600 PTHalos 
mocks with WMAP underlying cosmology. 



9 RESULTS FROM THE CMASS DR9 MOCKS 

9.1 Correlation Function Monopole 

We used the Landy and Szalay (1993) estimator to calculate 
the anisotropic redshift space correlation function, ^(s,/i), 
where s is the redshift-space separation and /i is the cosine 
of the angle between the galaxy pair and the line-of-sight: 



DD{s,^,)-2DR{s,^,) 



(29) 



where D stands for the data number counts and R stands for 
the random sample number counts with the same redshift 
distribution and angular footprint as the data sample. 

The moments of 5(s,/x), expanded in Legendre polyno- 
mials, contain all of the information about the correlation 
function. They are given by 



ii{s) 



(2£+l) 
2 



C(s,^)Pf(^)d^, 



(30) 



We will focus on the monopole ^o and the quadrupole 
^2 (see below) as in linear theory they contain most of the 
information. We weight pair counts based on their num- 
ber density, with weights lo = (1 + n{z)Pfkp)~^ (Feldman, 
Kaiser & Peacock 1994) where Pfkp = 20000. /i"^Mpc^ The 
same applies to the power spectrum. For more details on the 



weighting see Ross et al. (2012) and Anderson et al. (2012 1. 

In the top panel of Figure[l2]we present the mean of the 
monopole of the correlation function 5o(s) from our mocks. 
The red and blue lines show the mean of the 600 mocks using 
the NGC and SGC footprint respectively. The two means are 
similar as expected, and differ only because of cosmic vari- 
ance and differences in the survey geometry. The error bars 
show the RMS of the mocks, and thus give an estimation of 
the typical dispersion between them. The errors are smaller 
for the NGC because of the larger area. The DR9 CMASS 
^o(s) is shown as open circles. 

The relative bias between the data and the mean of the 
NGC mocks is shown in the bottom panel of Figure [12] The 
differences between data and mocks are within 10 per cent 
for most of the scales below 100 Mpc/h. 




Figure 11. Procedure to fit the geometry of DR9 into the simula- 
tion box using periodic boundary conditions. See text for details. 



In Figure [13] we present the distributions of the values 
of the correlation function of the mocks for several sepa- 
ration distances, in normalised units. That is, for each bin 
in s of the correlation function ^(s) one can compute its 
variance and express the value of the correlation function 
in its units. The histogram of the 600 values is also nor- 
malised to one. Thus if the mocks are Gaussian this dis- 
tribution should follow a normalised Gaussian distribution. 
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Figure 13. Histogram of the normalised residual counts of the correlation function l;{s) split into scales from 12 to 192 Mpc/h. 
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Figure 14. Histogram of the normalised residual counts of tlie correlation function ^(s) after being projected into the space where the 
covariance is diagonal. Results split into scales from 12 to 192 Mpc/h. 

© 0000 RAS, MNRAS 000, 000-000 



A Large Sample of Mock Galaxy Catalogues for BOSS 15 




DR9 CMASS 
Southern Mocks 
Northern Mocks 



s (h"' Mpc) 



> 






s (h"' Mpc) 

Figure 12. TOP: Correlation function monopole ^(s) of the 
NGC and SGC mocks, respectively shown in red and blue. The 
NGC footprint, having larger area has smaller errors. CMASS 
DR9 data is shown in open circles. Error bars are from the 600 
galaxy mock catalogues. BOTTOM: The relative bias between 
the mocks and the data, shown for the NGC mocks. 



In red solid lines we show the results for the NGC sample, 
and in blue dashed lines the results for the SGC sample. We 
see no significant deviation from the Gaussian distribution 
shown in black dotted lines, and there is no particular scale 
appearing to perform worse than the others. 

The values of the correlation functions at different scales 
are correlated. To have a better understanding of their dis- 
tribution we have made a transformation of the correlation 
function into the basis where the covariance matrix is diago- 
nal. In Figure[l4]we show the normalised distributions of the 
transformed correlation functions ^t{s), at different scales s. 
In this basis the distribution in each scale is independent 



of the others. In red solid lines we show the results for the 
NGC sample, and in blue dashed lines the results of the SGC 
sample. We see no significant deviation from the Gaussian 
distribution shown in black dotted lines, and, again, there 
is no particular scale appearing to perform worse than the 
others. 

To check the compatibility of the distribution of 
the mocks with a Gaussian distribution we performed 
a Kolmogorov-Smirnov test on the measured distribution 
function of ^t(s) of the NGC sample. The result depends on 
the range of scales used. For scales between 50 < s < 150 
Mpc/h in 9 per cent of the cases, a sample drawn from 
a Gaussian distribution with zero mean and unit variance 
would appear less Gaussian than that the distribution ob- 
tained from the 600 mocks. 



9.2 Correlation Function Quadrupole 

In Figure [15] we show the average measurement of the 
quadrupole for the NGC (red) and SGC (blue) mocks. The 
quadrupole measured from the CMASS DR9 data is shown 
in the open circles. Error bars show the RMS of the 600 
mocks. The anisotropic clustering, i.e., the quadrupole, can 
be used to estimate the growth rate of structure /. 

In the linear regime the expression for the redshift space 
distortions is ( Hamiltonl|1992 \ 



Us)= (bl + hj+^fjcis), (31) 

6(s) = - (^^bj + p'^ [m - e{s)] , (32) 

where ^'^ is the real space matter correlation function nor- 
malised so that 



^ is given by 



m = 



S,'^{s)s ds — 1 , 



^'^(s')s' ds' , 



(33) 



(34) 



and bg is the bias of galaxies. 

We have estimated values of galaxy bias bg and growth 
rate / in the mocks by performing a joint fit to the mea- 
sured redshift space monopole and quadrupole of the corre- 
lation function between the scales of 50 < s < 150Mpc/h. 
We used the standard perturbation theory predictions of the 
real space pairwise halo velocity statistics to model the non- 
linear contribution to the redshift space correlation func- 
tion ( |Reid fc Whitel|2011[ ). The fit results in &g = 1.90 and 
/ — 0.729. The value of the growth rate recovered in this 
fit is very close to the value from general relativity for this 
cosmological parameters, / = 0.744 (only a 2 per cent dif- 
ference). 

Notice that if we were to fit the quadrupole of the cor- 
relation function using only the linear theory to model the 
shape of the multipoles and the linear Kaiser formula for 
redshift-space distortions (Eq|32[), then the recovered best 
value of the fit to / would be lower. This is expected due 
to non-linearities, which act to decrease the redshift-space 
anisotropies predicted by the Kaiser formula, even on rela- 
tively large scales ( Scoccimarro|2004 1. 
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Figure 15. TOP: Correlation function quadrupole £,2{s) of the 
NGC and SGC mocks, respectively shown in red and blue. The 
NGC footprint, having larger area has smaller errors. CMASS 
DR,9 data is shown in open circles. Error bars show the RMS 
of 600 galaxy mock catalogues. BOTTOM: The relative bias be- 
tween the mocks and the data, shown for the NGC mocks. 
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Figure 16. TOP: Power spectrum P(k) of the NGC and SGC 
mocks, respectively shown in red and blue. CMASS DR9 data is 
shown in open circles. Error bars are from the 600 galaxy mock 
catalogues. BOTTOM: Relative bias between the mocks and the 
data, shown for the NGC mocks. The NGC footprint has the 
smaller errors because of its larger area. 



9.3 Power spectrum 

The top panel of Figure |16^ how the average power spec- 
trum of the mocks, both for the NGC and SGC footprints, 
compared with the DR9 CMASS galaxy power spectrum. 
In the bottom panel we show the relative bias between the 
data and the mocks, i.e, the square root of the ratio between 
their respective power spectra. The relative bias is within 10 
per cent for scales between 0.01 < k < 0.2 and increases at 
very low k. 

The amplitude of the power spectrum of the data is 
slightly higher than the average of the mocks. Consequently, 
the mocks underestimate the errors of the the amplitude of 



the measured power spectrum by the same factor. However, 
if what one wants to estimate the position of the BAO peak 
as in Anderson et al. (20121, then and a lower amplitude 



of the mocks would give conservative larger errors on the 
peak position. Thus, if anything, we are over-estimating our 
errors. 

We have checked that the power of the mocks at these 
scales have a scale dependence consistent with the theoreti- 
cal matter power spectrum used as an input, convolved with 
the window function of the survey. 
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10 COMPARISON WITH ANALYTIC 
PREDICTION 

In this Section we compare the covariance matrix of the 
galaxy mocks described above to a covariance matrix based 
on an analytical approach of de Putter et al. (2012). This 
approach provides a prescription for the dark matter power 
spectrum covariance matrix, taking into account the effects 
of survey geometry and using standard perturbation theory 
to include non-linear effects. The resulting covariance matrix 
has been shown to agree well with N-body simulations for 
modes k < 0.2/i/Mpc. However, to analytically describe the 
covariance matrix of the galaxy two point function, the ef- 
fects of galaxy bias, redshift space distortions and shot noise 
need to be taken into account in addition to the dark matter 
prescription. We now describe our simplified assumptions for 
these ingredients below. 

Galaxy bias is assumed to be linear and scale indepen- 
dent, with a value of bg = 1.9, which is the best fit to 
the mocks. Shot noise due to the finite number of galax- 



ies is incorporated following (Feldman, Kaiser & Peackok 



1994), which treats the shot noise as Gaussian. Finally, red- 



shift space distortions are incorporated using the expression 
based on linear the orv and the plane-parallel approxima- 

+ /?(fc • hf)Sg{'k) where 
Q'^^^{z) the growth fac- 



19871 ^^(k) 



tion (Kaiser 

P = f/bg, with / = dlnd/dlna 

tor, and n the line-of-sight unit vector. On large scales, this 

causes a simple rescaling of the covariance matrix by the 

angle average of the fourth power of the "Kaiser factor", 

a^sdiP) = 1 + 4/3/3 + 6/5/3^ + 4/7/3^ + 1/9/3*, which we use 

to multiply the entire covariance matrix. 

Putting it all together, the analytic model for the co- 
variance between galaxy power spectrum estimators in bins 
i and j in fc— space is obtained by symmetrizing 



,gal 



Ji Vk,i J i Vk,j 

X arsd(/3), 



I 7,4 _matt.iion — liii 



(35) 



where v^^i is the fc-volume in a bin i, p{k) is the matter power 
spectrum, and 



a(k) 



J22(k) 
/22(0)' 



S{k) 



Ji2(k) 

/22(0)' 



(36) 



with /ij(k) = J n^(r)w-' {r)e^^'^d'^r, n is the selection func- 
tion of the survey and w = (1-1- nPo)~^ the optimal FKP 



weight function. In Eq. ( 35 I , c™^ '""^ '" describes the non- 



Gaussian matter power spectrum covariance matrix and is 
given by the second and third lines of Eq. (47) in de Putter 
(2012), to which we refer the reader for more details. 

To obtain the covariance matrix of the two-point func- 
tion, this matrix is transformed applying the linear trans- 
formation between the Feldman Kaiser and Peacock (1994) 
power spectrum estimator and the Landy & Szalay (1993) 
correlation function estimator. 

The main caveats in the analytic method come from 
the simplified transformation described above between the 
real-space dark matter covariance matrix and the redshift- 
space galaxy covariance matrix. In reality, the galaxy bias 
is not linear and this affects the non-Gaussian contribution 
to the covariance matrix. Moreover, the analytic model only 
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Figure 17. Correlation coefficients C{k,ki) for the power spec- 
trum of the mocks (in blue solid lines) compared to the analytical 
values (in dashed red lines) 
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Figure 18. Eigenvalues of the normalised covariance matrix 
of the mocks' correlation function (blue circles) compared to an 
smoothed version of it (green diamonds) and to analytical values 
(red squares). 



describes redshift space distortions at the linear level, and 
therefore does not include "fingers of god" effects which ap- 
pear already on weakly non-linear scales. Finally, the shot 
noise also contributes to the non-Gaussian part of the covari- 
ance matrix. However, the analytic description is expected 
to work well in the linear regime, and provides a reason- 
able estimate to compare to the numerical method from the 
mocks in the range of scales of interest (35-140 Mpc//i). 

We now compare the galaxy mock covariance matrix 
with the analytic estimates using the DR9 NGC footprint 
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and assuming our WMAP fiducial cosmology described in 
Section [5l 

We start first with the power spectrum covariance ma- 



trix. Figure 17 shows the normalised (to have unit diagonal) 
covariance matrix or cross-correlation coefficients, C{ki,k), 
of the power spectrum of the mocks (in solid blue lines) and 
of the analytical model (in red dashed lines). The plots are 
for the value k = 0.0624 h/Mpc and k = 0.204 h/Mpc but 
similar results are obtained when fixing fci at other values. 
The mocks have a somewhat stronger correlation amplitude 
than the analytical model, which is not surprising given that 
nonlinear contributions from redshift-space distortions and 
bias are not taken into account in Eq. ( 35 I , as discussed 
above. 

We now turn to configuration space. Figure [18] shows 
the eigenvalues from the mock correlation functions (blue 
circles) compared with the eigenvalues of the analytical 
model (red squares). Both give comparable results, with the 
four largest eigenvalues having differences at the < 10 per 
cent level, which increases up to 25 per cent for the four- 
teenth eigenvalue. 

Figure [18] also shows a comparison with the method 
of Xu et al. (2012), denoted by green diamonds, which is 
based on fitting a modified form of the Gaussian covariance 
matrix to the sample covariance matrix from the mocks us- 
ing a maximum likelihood approach. The eigenvalues of the 
smoothed version of the covariance matrix are consistent at 
the 10 per cent level with the values of the sample covariance 
from the mocks. 



11 CORRELATION FUNCTION AND 
COVARIANCE MATRIX TABLES 

We have created galaxy mocks with the DR9 footprint and 
CMASS redshift selection. They can be used by the commu- 
nity as means of computing covariance matrices for large- 
scale structure analysis. 

Tables 2 and 3 show respectively the mean of the 
monopole of correlation function and the covariance matrix 
of the 600 mocks for the NGC and SGC footprints. The log- 
arithmic binning of the correlation function here is the same 
as in Samushia et al. (2012) and Reid et al. (2012). 

Note that the 600 NGC and SGC mocks are obtained 
from the same 600 primary PTHalos fields. Therefore the 
NGC and SGC mocks are not truly independent. The mea- 
sured correlation between the mocks with the same seeds is 
however small, (3 ± 2) per cent. Due to slight sample varia- 
tion between NGC and SGC samples (Ross et al. 2012), we 
adopt a different fitted n(z) for both. 

PTHalo mocks, tables of the covariance matrices, and 
covariance matrices with different binning will be available 
from the mocks website r] after the DR9 is made public and 
this work is published. Updated version of the mocks will be 
also hosted at this site. 



http://www.marcmanera.net/niocks/ 



12 CONCLUSIONS 

In this paper we have presented a method to quickly pro- 
duce large number of galaxy mocks for large scale structure 
analysis. The method has five steps: 

(i) Generate a dark matter particle field using 2LPT. 

(ii) Obtain halos using a FoF algorithm with an appropri- 
ate linking length, which we have tested to be b=0.38 times 
the mean inter-particle separation at redshift z ~ 0.5. 

(iii) Promote the mass of these 2LPT halos to new PTHa- 
los masses using the transformation function that maps the 
mean mass 2LPT halo mass function to the desired PThalo 
mass function, typically measured or derived from simula- 
tions. 

(iv) Populate the halos with galaxies using an HOD pre- 
scription with the HOD parameters fit to reproduce the cor- 
relation function of the observed survey, in this case CMASS 
DR9 sample. 

(v) Apply survey mask and galaxy selection criteria. 

The time savings compared to doing mock catalogues 
from N-body simulations comes from the first step (where 
for the particle numbers used here, 2LPT is about three 
orders of magnitude faster than N-body simulations). The 
total time spent in making mock catalogues in PTHalos is 
dominated by the subsequent steps, and thus the speedup 
factor at the end of the procedure is reduced to about two 
orders of magnitude. 

We have tested the clustering of the PTHalos generated 
by this method by comparing the halo-matter cross-power 
spectrum of 40 PTHalos realisations with that of 40 Las- 
Damas N-body simulations with the same cosmology, mass 
resolution and Fourier phases. The clustering is recovered to 
within 10 per cent level (see Figure H|. And the correlation 
coefficients show that the PTHalos trace the same structures 
as the N-body simulations (see Figure [5|. 

We have used the LasDamas N-body simulations to test 
the proper linking length value to be used with FoF halos 
from 2LPT fields. We have found that the theoretical moti- 
vated value of 6 ~ 0.38 (Section [6.1| l is the one that performs 
best within the range of values we test against an N-body 
simulation (Section [6. 2[ ). 

We have applied our method to generate 600 galaxy 
mocks catalogues for the DR9 BOSS CMASS galaxies. For 
these mocks we have fixed the mass function of PTHalos 
to that of Tinker et al. (2008), for our cosmology, and set 
the HOD parameters by fitting the DR9 data correlation 
function (see Section 8.1 1. In Sections 9.1 9.3 and 9.2 we 



present the the monopole of the correlation function, the 
monopole of the power spectrum, and the quadrupole of the 
correlation function, and its comparison to the CMASS DR9 
data. In Section [TT] we presented the covariance matrices. 

The 600 mocks were produced using a cubical box re- 
shaped to match BOSS DR9 geometry, separately for both 
NGC and SGC footprints. Redshift space distortions are in- 
cluded. Mocks have been used within the BOSS collabo- 
ration in the analysis of the Baryon Acoustic Oscillations 



(Anderson et al. 20121, redshift space distortions (Reid et 
al. 2012, Samushia et al. 2012), clustering of galaxies below 
lOOMpc/h compared with simulations (Nuza et al. 2012), 
systematics of CMASS DR9 galaxies (Ross et al. 2012), bias 
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evolution (Tojeiro et al. 2012), and fit to the full shape of 
the correlation function (Sanchez et al. 2012). 

Finally, we have compared the covariance matrices to 
analytical covariance matrices and found a good agreement 
with differences less than 10 per cent for the principal eigen- 



values of the covariance of the correlation (Section 10 1. 

The mocks, and the covariance matrices of this paper, 
as well as covariance matrices with other binnings will be 
available form the mocks websitqil 
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Table 2. The monopole of the correlation function multiplied by 10^, for ^o(s) with 30 < s < 160/i^^Mpc for the DR9 NGC and 
SGC mocks. Correlation functions from PTHalos galaxy mock catalogues will be available with this and other binnings will be available 
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Table 3. The covariance matrix, multiplied by lO'', for go(s) with 30 < s < WOh-^Mpc, for the DR9 NGC (top) and SGC 
(bottom) footprint. Covariance matrices from PTHalo galaxy mock catalogues with this and other binnings will be available at 
'[http://www. marcmanera. net/mocks/]'. 
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